Mining Cuboid Outliers in Information Networks
نویسنده
چکیده
The study of complex networks or graphs has been extensively pursued by researchers from multiple disciplines. In history, the well-known “Seven Bridges of Koenigsberg” problem is the first real-world problem that was solved by the study of networks. Since then, there has been significant theoretical advancement in this area. Network science has been explored in diverse fields such as sociology, biology, physics and computer science and has spanned a plethora of applications like link analysis, community detection, influence analysis. Only recently, detecting outliers in such information networks has caught the attention of researchers in the data mining community. Outlier detection in information networks is the focus of this thesis. Outliers are structures in networks which have unusual patterns, which deviate significantly from the rest of the data. Outliers in graphs can appear in different forms like nodes, edges, subgraphs, communities and so on. In this thesis, we propose a novel outlier structure in networks, graph cuboid outliers. We put forward the novel problem of graph cuboid outlier detection in both static and timeevolving networks and design efficient algorithms for the same. Recently, there has been an interest in finding outliers with respect to a given query. Query-based outlier detection is more favorable over the general outliers as it allows the user a flexibility to find outliers following a particular schema and predicates encoded in the form of a query. Given a heterogeneous network and a query we detect Graph Cuboid Outliers (GCOutliers). Graph cuboids are semantically related regions in a network which helps us view the network along multiple dimensions and at multiple levels. GCOutliers are those cuboids which exhibit anomalous behavior with respect to the given subgraph query. An example of a GCOutlier would be those research areas which have unusual collaborations in a DBLP network. Further, we can obtain such regions with respect to specific requirements, such as finding those research areas where a large numbers of papers are published with at least five authors. The requirement is modeled as a subgraph query to obtain query sensitive graph cuboid outliers. In order to detect GCOutliers, there are several challenges; (i) the number of cuboids can be high, (ii) the number of matches can be large and (iii) subgraph isomorphism is an NP-hard problem. In our solution we address the issue of the large number of matches. This is done by designing a na ̈ıve random sampling method followed by a more principled sampling method using regression models for nodes and edges. We next move our attention to time-evolving or temporal networks. For a series of snapshots of a heterogeneous temporal network we propose to detect Evolutionary Graph Cuboid Outliers (EGCOutliers) in temporal networks. EGCOutliers are those cuboids which have an
منابع مشابه
Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملA method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction
Abstract Purpose: Errors in data collection and failure to pay attention to data that are noisy in the collection process for any reason cause problems in data-based analysis and, as a result, wrong decision-making. Therefore, solving the problem of missing or noisy data before processing and analysis is of vital importance in analytical systems. The purpose of this paper is to provide a metho...
متن کاملMining Outliers in Spatial Networks
Outlier analysis is an important task in data mining and has attracted much attention in both research and applications. Previous work on outlier detection involves different types of databases such as spatial databases, time series databases, biomedical databases, etc. However, few of the existing studies have considered spatial networks where points reside on every edge. In this paper, we stu...
متن کاملPrediction of user's trustworthiness in web-based social networks via text mining
In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...
متن کاملOpinion Mining, Social Networks, Higher Education
Background and Aim: With the advent of technology and the use of social networks such as Instagram, Facebook, blogs, forums, and many other platforms, interactions of learners with one another and their lecturers have become progressively relaxed. This has led to the accumulation of large quantities of data and information about studentschr('39') attitudes, learning experiences, opinions, and f...
متن کامل